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Abstract 

We introduce a new notion of graph sparsification based on spectral similarity of graph 
Laplacians: spectral sparsification requires that the Laplacian quadratic form of the sparsifier 
approximate that of the original. This is equivalent to saying that the Laplacian of the 
sparsifier is a good preconditioner for the Laplacian of the original. 

We prove that every graph has a spectral sparsifier of nearly-linear size. Moreover, we 
present an algorithm that produces spectral sparsifiers in time O (m log'^ m) , where m is the 
number of edges in the original graph and c is some absolute constant. This construction is 
a key component of a nearly-linear time algorithm for solving linear equations in diagonally- 
dominant matrices. 

Our sparsification algorithm makes use of a nearly-linear time algorithm for graph parti- 
tioning that satisfies a strong guarantee: if the partition it outputs is very unbalanced, then 
the larger part is contained in a subgraph of high conductance. 



*This paper is the second in a sequence of three papers expanding on material that appeared first under the title 
"Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems" |ST04) . 
The first paper, "A Local Clustering Algorithm for Massive Graphs and its Application to Nearly-Linear Time 
Graph Partitioning" STQSa, contains graph partitioning algorithms that are used to construct the sparsifiers 
in this paper. The third paper, "Nearly-Linear Time Algorithms for Preconditioning and Solving Symmetric, 
Diagonally Dominant Linear Systems" [ST08b| contains the results on solving linear equations and approximating 
eigenvalues and eigenvectors. 

This material is based upon work supported by the National Science Foundation under Grant Nos. 0325630, 
0324914, 0634957, 0635102 and 0707522. Any opinions, findings, and conclusions or recommendations expressed in 
this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. 

Shang-Hua Teng wrote part of this paper while at MSR-NE lab and Boston University. 
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1 Introduction 



Graph sparsification is the task of approximating a graph by a sparse graph, and is often useful 
in the design of efficient approximation algorithms. Several notions of graph sparsification 
have been proposed. For example, Chew |Che89| was motivated by proximity problems in 
computational geometry to introduce graph spanners. Spanners are defined in terms of the 
distance similarity of two graphs: A spanner is a sparse graph in which the shortest-path distance 
between every pair of vertices is approximately the same in the original graph as in the spanner. 
Motivated by cut problems, Benczur and Karger [BK96j introduced a notion of sparsification 
that requires that for every set of vertices, the weight of the edges leaving that set should be 
approximately the same in the original graph as in the sparsifier. 

Motivated by problems in numerical linear algebra and spectral graph theory, we introduce a 
new notion of sparsification that we call spectral sparsification. A spectral sparsifier is a subgraph 
of the original whose Laplacian quadratic form is approximately the same as that of the original 
graph on all real vector inputs. The Laplacian matrix0 of a weighted graph G = {V, E, w), where 
is the weight of edge (u, v), is defined by 



Lg{u,v) 



It is better understood by its quadratic form, which on x S IR^ takes the value 

x^Lgx = ^ W(u,^) {x{u) - x{v)f . (1) 

>V 



We say that G is a a-spectral approximation of G if for all x G IR 



\ rp q-^ rp 

—X L^x < X Lqx < ax L^x. (2) 



a 



Our notion of sparsification captures the spectral similarity between a graph and its sparsi- 
fiers. It is a stronger notion than the cut sparsification of Benczur and Karger: the cut-sparsifiers 
constructed by Benczur and Karger |BK96j are only required to satisfy these inequalities for all 
a;G{0,l}^. In Section [S] we present an example demonstrating that these notions of approxi- 
mation are in fact different. 

Our main result is that every weighted graph has a spectral sparsifier with O (n) edges that 
can be computed in O (m) time, where we recall that 0{f{n)) means 0(/(n) log'^ /(n)), for 
some constant c. In particular, we prove that for every weighted graph G = {V, E, w) and every 
e > 0, there is a re- weighted subgraph of G with O {n/e^) edges that is a (1 + e) approximation of 
G. Moreover, we show how to find such a subgraph in O (m) time, where n = |y| and m = \E\. 
The constants and powers of logarithms hidden in the O-notation in the statement of our results 
are quite large. Our goal in this paper is not to produce sparsifiers with optimal parameters, 
but rather just to prove that spectral sparsifiers with a nearly-linear number of edges exist and 
that they can be found in nearly-linear time. 



^For more information on the Laplacian matrix of a graph, we refer the reader to one of [Bol98l [Moh91 1 [GROl I 
IChu97| . 
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Our sparsification algorithm makes use of a nearly-linear time graph partitioning algorithm, 
ApproxCut, that we develop in Section [8] and which may be of independent interest. On input 
a target conductance (p, ApproxCut always outputs a set of vertices of conductance less than 
(j). With high probability, if the set it outputs is small then its complement is contained in a 



2 The Bigger Picture 

This paper arose in our efforts to design nearly-linear time algorithms for solving diagonally- 
dominant linear systems, and is the second in a sequence of three papers on the topic. In the 
first paper |ST08aj . we develop fast routines for partitioning graphs, which we then use in our 
algorithms for building sparsifiers. In the last paper |ST08b| . we show how to use sparsifiers 
to build preconditioners for diagonally-dominant matrices and thereby solve linear equations in 
such matrices in nearly-linear time. Koutis, Miller and Peng |KMP10] have recently developed 
an algorithm for solving such systems of linear equations in time 0(m log^ n) that does not rely 
upon the sparsifiers of the present paper. 

The quality of a preconditioner is measured by the relative condition number, which for the 
Laplacian matrices of a graph G and its sparsifier G is 



So, if G is a cj-spectral approximation of G then k{G,G) < cr^. This means that an iterative 
solver such as the Preconditioned Conjugate Gradient |Axe85| can solve a linear system in the 
Laplacian of G to accuracy e by solving 0(cr log(l/e)) linear systems in G and performing as 
many multiplications by G. As a linear system in a matrix with m non-zero entries may be solved 
in time 0{nm) by using the Conjugate Gradient as a direct method |TB971 Theorem 28.3], the 
use of the sparsifiers in this paper alone provides an algorithm for solving linear systems in Lq 
to e-accuracy in time O (n^ log(l/e)) , which is nearly optimal when the Laplacian matrix has 
non-zero entries. In our paper on solving linear equations |ST08b| . we show how to get 
the time bound down to O (m log(l/e)), where m is the number of non-zero entries in Lo- 



in Section HI we present technical background required for this paper, and maybe even for the 
rest of this outline. In Section [5l we present three examples of graphs and their sparsifiers. 
These examples help motivate key elements of our construction. 

There are three components to our algorithm for sparsifying graphs. The first is a ran- 
dom sampling procedure. In Section [6l we prove that this procedure produces good spectral 
sparsifiers for graphs of high conductance. So that we may reduce the problem of sparsifying 
arbitrary graphs to that of sparsifying graphs of high conductance, we require a fast algorithm 
for partitioning a graph into parts of high conductance without removing too many edges. In 
Section [71 we first prove that such partitions exist, and use them to prove the existence of spec- 
tral sparsifiers for all unweighted graphs. In Section [HI we then build on tools from |ST08a] 
to develop a graph partitioning procedure that suffices. We use this procedure in Section [9] to 





3 Outline 
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construct a nearly-linear time algorithm for sparsifying unweighted graphs. We show how to use 
this algorithm to sparsify weighted graphs in Section [lOl 

We conclude in Section [TT] by surveying recent improvements that have been made in both 
sparsification and in the partitioning routines on which the present paper depends. 



4 Background and Notation 

By log we always mean the logarithm base 2, and we denote the natural logarithm by In. 

As we spend this paper studying spectral approximations, we will say "cr- approximation" 
instead of "cr-spectral approximation" wherever it won't create confusion. 

We may express ([2|) more compactly by employing the notation A ^ B to mean 

Ax < x^Bx, for all x G IRX . 
Inequality ([2]) is then equivalent to 

4 Lg 4 (yL^. (3) 

We will overload notation by writing G =4 G for graphs G and G to mean Lq =4 L^. 
For two graphs G and H, we write 

G + H 

to indicate the graph whose Laplacian is Lq + Lh- That is, the weight of every edge in G + H 
is the sum of the weights of the corresponding edges in G and H. We will use this notation even 
if G and H have different vertex sets. For example, if their vertex sets are disjoint, then their 
sum is simply the disjoint union of the graphs. It is immediate that G ^ G and H ^ H imply 

G + H ^G + H. 

In many portions of this paper, we will consider vertex-induced subgraphs of graphs. When we 
take subgraphs, we always preserve the identity of vertices. This enables us to sum inequalities 
on the different subgraphs to say something about the original. 

For an unweighted graph G = iy, E), we will let dy denote the degree of vertex v. For S and 
T disjoint subsets of V, we let E{S, T) denote the set of edges in E connecting one vertex of S 
with one vertex of T. We let G{S) denote the subgraph of G induced on the vertices in S: the 
graph with vertex set S containing the edges of E between vertices in S. 

For S" C we define Vol (S) = J2ies Observe that Vol (V) = 2m if G has m edges. The 
conductance of a set of vertices S, written (S), is often defined by 

ms.v-s,i 



min ( Vol (5), Vol (y- 5)) 
The conductance of G is then given by 

$G "== min <I> (S) . 

fd^scv ^ ' 

The conductance of a graph is related to the smallest non-zero eigenvalue of its Laplacian 
matrix, but is even more strongly related to the smallest non-zero eigenvalue of its Normalized 
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Laplacian matrix (see |Chu97| ) . whose definition we now recall. Let D be the diagonal matrix 
whose v-th diagonal is d^. The Normalized Laplacian of the graph G, written Co, is defined by 

It is well-known that both Lq and Cg are positive semi-definite matrices, with smallest eigen- 
value zero. The eigenvalue zero has multiplicity one if an only if the graph G is connected, in 
which case the eigenvector of Lg with eigenvalue zero is the constant vector (see |Bol981 page 
269], or derive from ([I])). 

Our analysis exploits a discreet version of Cheeger's inequality [CheTO] (see |Chu971 ISJ891 
IDS91] ). which relates the smallest non-zero eigenvalue of Cg, written \2{Cg)-, to the conductance 
of G. 

Theorem 4.1 (Cheeger's Inequality). 

2$G > ^2{Cg) > 



5 A few examples 

5.1 Example 1: Complete Graph 




vertices 



We first consider what a sparsifier of the complete graph should look like. Let G be the 
complete graph on n vertices. All non-zero eigenvalues of Lg equal n. So, for every unit vector 
X orthogonal to the all-Is vector, 

n. 



X Lgx 



From Cheeger's inequality, one may prove that graphs with constant conductance, called ex- 
panders, have a similar property. Spectrally speaking, the best of them are the Ramanujan 
graphs |LPS88t IMar88] , which are d-regular graphs all of whose non-zero Laplacian eigenvalues 
lie between d — lyj d — 1 and d + lyj d^ 1. So, if we let G be a Ramanujan graph in which every 
edge has been given weight n/d, then for every unit vector x orthogonal to the all-Is vector, 



X L qX G 



n 



d 



1 2ny/d 



1 



Thus, G is a (l — 2\/d — l/d) -approximation of G. 
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5.2 Example 2: Joined Complete Graphs 



G: Two complete graphs joined by an edge. G: A good approximation of G. Thicker edges 

indicate edges of weight 3 

Next, consider a graph on 2n vertices obtained by joining two complete graphs on n vertices 
by a single edge, e. Let Vi and V2 be the vertex sets of the two complete graphs. Wc claim that 
a good sparsificr for G may be obtained by setting G to be the edge e with weight 1, plus (n/d) 
times a Ramanujan graph on each vertex set. To prove this, let Gi and G2 denote the complete 
graphs on Vi and V2, and let G3 denote the graph just consisting of the edge e. Similarly, let Gi 
and G2 denote (n/d) times a Ramanujan graph on each vertex set, and let G3 = G^. Recalling 
the addition we defined on graphs, we have 

G = Gi + G2 + G3, and 
G = G\ + G2 + G3. 

We already know that for a = (l — 2y/d~ 1/d) ^ , and i G {1, 2} 

—Gi ^ Gi ^ crGi- 
a 

As Ga = G3, we have 

G = Gi + G2 + Ga ^ (tGi + aG2 + G3 ^ aGi + aG2 + C7G3 = aG. 

The other inequality follows by similar reasoning. This example demonstrates both the utility 
of using edges with different weights, even when sparsifying unweighted graphs, and how we can 

combine sparsifiers of subgraphs to sparsify an entire graph. Also observe that every sparsifier 
of G must contain the edge e, while no other edge is particularly important. 

5.3 Example 3: Distinguishing cut sparsifiers from spectral sparsifiers 

Our last example will demonstrate the difference between our notion of sparsification and that 
of Benczur and Karger. We will describe graphs G and G for which G is not a cr-approximation 

of G for any small a, but it is a very good sparsificr of G under the definition considered by 
Benczur and Karger. The vertex set V will be {0, . . . , n — 1} x {1, . . . , k}, where n is even. The 
graph G will consist of n complete bipartite graphs, connecting all pairs of vertices {u, i) and 
(v, j) where v = it it 1 mod n. The graph G will be identical to the graph G, except that it 
will have one additional edge e from vertex (0,1) to vertex (n/2,1). As the minimum cut of 
G has size 2k, and G only differs by one edge, G is a (1 + l/2/c)-approximation of G in the 
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G: n — 8 sets of fc = 4 vertices G: A good cut sparsifier of G, but a 
arranged in a ring and connected by poor spectral sparsifier 
complete bipartite graphs, plus one 
edge across. 



notion considered by Benczur and Karger. To show that G is a poor spectral approximation of 
G, consider the vector x given by 



So, inequality ([2]) is not satisfied for any a less than 1 + n/Ak'^. 

6 Sampling Graphs 

In this section, we show that if a graph has high conductance, then it may be sparsified by a 
simple random sampling procedure. The sampling procedure involves assigning a probability 
Pij to each edge and then selecting edge to be in the graph G with probability 

Pi J. When edge (i, j) is chosen to be in the graph, we multiply its weight by ^/pij- As the 
graph is undirected, we implicitly assume that pij = pj^i. Let A denote the adjacency matrix 
of the original graph G, and A the adjacency matrix of the sampled graph G. This procedure 
guarantees that 



Sampling procedures of this form were examined by Benczur and Karger [BK96| and Achlioptas 
and McSherry |AM01j . Achlioptas and McSherry analyze the approximation obtained by such 
a procedure through a bound on the norm of a random matrix of Fiiredi and Komlos |FK81] . 
As their bound does not suffice for our purposes, we tighten it by refining the analysis of Fiiredi 
and Komlos. 

If G is going to be a sparsifier for G, then we must be sure that every vertex in G has edges 
attached to it. We guarantee this by requiring that, for some parameter T > 1, 



x{u, i) 



min(u, n — u). 



One can verify that 



X L^,x = nk , while x Lqx = nk + \n/2) . 



E A 



A. 




for all edges 



(4) 
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The parameter T controls the number of edges we expect to find in the graph, and will be set 
to at least Q (log n) to ensure that every vertex has an attached edge. 

We will show that if G has high conductance and (HD is satisfied for a sufficiently large T, 
then G will be a good sparsifier of G with high probability. The actual theorem that we prove 
is slightly more complicated, as it considers the case where we only apply the sampling on a 
subgraph of G. 

Theorem 6.1 (Sampling High-Conductance Graphs). Let e,p € (0, 1/2) and let G = {V,E) be 
an unweighted graph whose smallest non-zero normalized Laplacian eigenvalue is at least A. Let 
S be a subset of the vertices of G, let F be the edges in G{S), and let H = E — F be the rest of 
the edges. Let 

{S,F) = Samplers, F),e,p,X), 
and let G = {V, F U H). Then, with probability at least 1 — p, 

(5.1) G is a {1 + e)- approximation ofG, and 

(5.2) The number of edges in F is at most 

288max(log2(3/p),log2n)^ , ^, 



G = 


Sample(G, e,p. A) 




1. 


Set k = max (log2(3/p), loj 




2. 


SetT = (if)^ 




3. 


For every edge {i,j) in G, 


set Pij= mm (l, • 


4. 


For every edge (i,j) in G, 
vertices (i,j) into G. 


with probability pij put an edge of weight l/pij between 



Let D be the diagonal matrix of degrees of vertices of G. Toprove Theorem 16. H we establish 
that the 2-norm of D~^/'^{Lg — L^)D~^/'^ is probably smaljj, and then apply the following 
lemma. 

Lemma 6.2. Let L be the Laplacian matrix of a connected graph G, L be the Laplacian of G, 
and let D be the diagonal matrix of degrees of G. If 

1. Aa iD-^/'^LD-^''^) > A, and 



2. 



D-^I\L - L)D-^/^ 



then G is a a -approximation of G for 

a 



A-e 



Recall that the 2-norm of a symmetric matrix is the largest absolute value of its eigenvalues. 
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Proof. Let x be any vector and let y = D^^^x. By assumption, G is connected and so the 
nullspace of the normahzed Laplacian D^^^^LD^^^"^ is spanned by D^^^l. Let z be the projection 
of y orthogonal to D^^^l, so 



= y'^D-'/^LD-^/^y = (d-^I^LD-^/^) z > \\ 



(5) 



We compute 



x'^Lx = y'^D-^/'^LD-^/'^y 
= z^D-^I^LD-^'^z 



/2, 



r 1/9 1/9 / z'^D-^I'^CL- L)D 
z^D-^'^LD-^'^z 1 + \- '— 



> z^D~^'^LD-^/\ 1 



e \\z\\ 
\\\z\ 



zTD-^/^LD-^/^z 

2 \ 



(by assumption 2 and (l5|)) 



A-e 



x^Lx. (again by ^) 



We may similarly show that 



X Lx < I — - — I X Lx < 



A-e 



x^Lx. 



The lemma follows from these inequalities. 



□ 



A, 



Let A be the adjacency matrix of G and let A be the adjacency matrix of G. For each edge 

^/Pi,j with probability Pi J and 
with probability 1 —Pij- 

To prove Theorem 16. 11 we will observe that 

D-V2(L-L)i^-V2 < D-^/\A-A)D-^/^ + D-^/\D - D)D-^/ 

where D is the diagonal matrix of the diagonal entries of L. It will be easy to bound the second 
of these terms, so we defer that part of the proof to the end of the section. A bound on the first 
term comes from the following lemma. 

Lemma 6.3 (Random Subgraph). For all even integers k, 



Pr 



D-^I'^{A- A)D'^ 



/2 



> 



T 



< 2" 
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Our proof of this lemma applies a modification of techniques introduced by Fiiredi and 
Komlos |FK81) (See also the paper by Vu |Vu07j that corrects some bugs in their work). How- 
ever, they consider the eigenvalues of random graphs in which every edge can appear. Some 
interesting modifications are required to make an argument such as ours work when downsam- 
pling a graph that may already be sparse. We remark that without too much work one can 
generalize Theorem 16. II so that it applies to weighted graphs. 

Proof of Lemma \6.3l To simplify notation, define 

A = D-^{A-A), 



so for each edge 



A 



di 



1) with probability pij, and 
with probability 1 — pij . 



Note that D-'^/'^{A - A)D^^I'^ has the same eigenvalues as A. So, it suffices to bound the 
absolute values of the eigenvalues of A. Rather than trying to upper bound the eigenvalues of 
A directly, we will upper bound a power of A's trace. As the trace of a matrix is the sum of 
its eigenvalues, Tr (A'^) is an upper bound on the fcth power of every eigenvalue of A, for every 
even power k. 



Lemma 16.41 implies that, for even k, 

nk^ 



fk/2 



> E 



Tr A 



Applying Markov's inequality, we obtain 



Pr 



Tr A'^ > 2 



> E 



, nk 



A' 



< 1/2^ 



Recalling that the eigenvalues of A'^ are the A;-th powers of the eigenvalues of A, and taking 
k-ih. roots, we conclude 



Pr 



Lemma 6.4. For even k, 



D-^'^{A- A)D-^ 



/2 



> 2- 



n 



i/kf^ 



TV2 



< 1/2^^ 



□ 



E 



Tr A 



< 



nk 

fk/2 ' 



Proof. Recall that the {vo,Vk) entry of A^ satisfies 



vi,...,Vk-i i=l 
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Taking expectations, we obtain 



E 



A' 



VO,Vk 



E 



■,Vk-l 



(6) 



We will now describe a way of coding every sequence vi, . . . , Vk-i that could possibly contribute 
to the sum. Of course, any sequence containing a consecutive pair for which Ay._-^^y. is 

always zero will contribute zero to the sum. So, for a sequence to have a non-zero contribution, 
each consecutive pair {vi-i,Vi) must be an edge in the graph A. Thus, we can identify every 
sequence with non-zero contribution with a walk on the graph A from vertex vq to vertex Vk- 

The first idea in our analysis is to observe that most of the terms in this sum are zero. The 
reason is that, for all Vi and Vj 

E[A„^,.J =0. 

As Ay^^Vj is independent of every term in A other than A^ .^^., we see that the term 



E 



Vi-l,Vi 



(7) 



corresponding to ui, . . . will be zero unless each edge {vi-i,Vi) appears at least twice (in 

either direction). 

We now describe a method for coding all walks in which each edges appears at least twice. 
We set T to be the set of time steps i at which the edge between Vi-i and Vi does not appear 
earlier in the walk (in either direction). Note that 1 is always an element of T. We then let r 
denote the map from [A;] — T — >■ T, indicating for each time step not in T the time step in which 
the edge traversed first appeared (regardless of in which direction it is traversed). Note that we 
need only consider the cases in which |r| < k/2, as otherwise some edge appears only once in 
the walk. To finish our description of a walk, we need a map 

(T : T ^ {!,... , 

indicating the vertex encountered at each time i eT. 
For example, for the walk 



Step 





1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Vertex 


a 


b 


c 


d 


b 


c 


d 


b 


e 


b 


a 



we get 



r = {1,2,3,4,8} T 



5 2 

6 3 
7h^4 
9 1-^ 8 

10^ 1 



1 ^ b 

a : 3 ^ d 
4^6 
8 e 



Using T, r and a, we can inductively reconstruct the sequence vi, 
• il i Vi = cr{i), 



, Vk-i by the rules 
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if i r, and Vi-i = t'r(j)-ii then Vi 
if i T, and Vi-i = ^,-(1), then Vi = 



'Hi) 



and 



^T(i)-l- 

If Vi-i {■yr(j)) ■yr(j)-i}) then the tuple {T,T,a) does not properly code a walk on the graph of 
A. We will call a a ?;a/id assignment for T and r if the above rules do produce a walk on the 
graph of A from vq to Ufc. 
We have 



E 



A* 



T.T (7 valid for T and t 



.1=1 



(where (fi, . . . , Vk-i) is the sequence encoded by (T, r, o")) 



Each of the terms 



E E Re 

r,T a valid for T and t seT 



E 



i:r(i)=s 



(8) 



i:T{i)=s 

is independent of the others, and involves a product of the terms and At,^.t,^_-^. In 

Lemma 16.61 we will prove that 



E 



which implies 



E 



<T valid for T and t s£T 



i:T{i)=s 



i:T{i)=s 



< 



j\{i:T{i)=s}\ ' 



(9) 



< 



1 



'^k-\T\ 



E n 



a valid for T and r seT 



(10) 



To bound the sum of products on the right hand-side of (|lUp . fix T and r and consider the 
following random process for generating a valid a and corresponding walk: go through the 
elements of T in order. For each s € T, pick a{s) to be a random neighbor of the s — 1st vertex 
in the walk. If possible, continue the walk according to r until it reaches the next step in T. If 
the process produces a valid a, return it. Otherwise, return nothing. The probability that any 
particular valid a will be returned by this process is 



'"Vs-l 



So, 



E 



(T valid for T and t sST 



< 1. 



(11) 
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As there are at most at most 2^^ choices for T, and at most \T\^ < \T\^ choices for r, we 
may combine inequahties (fTOj) and ([TT]) with ([8]) to obtain 



E 



A' 



{2\T\f 



(using |T| < k/2) 



The lemma now fohows from 



E 



n 



110 = 1 



A' 



■>Jo,->'o 



□ 



Claim 6.5. 



Proof. If = 1, then Ajj = 0. If not, then we have T/ min(d.j, dj) = pij < 1. With probabihty 

1 1 

< 1/T. 



lAj -I = — < 

di mm[di,dj 



On the other hand, with probability pij, 

1 / 1 

A 



1 



1/1 \ 11 

n ^ — { 1 < < 7 7 — 

di \Pi,j ) diPij mm{di,dj)pi 



1 



1/T. 



As Ajj > in this case, we have established |Ajj| < 1/T. 
Lemma 6.6. For all edges (r, t) and integers k > 1 and I > 0, 



□ 



E 



A A; a/ 



< 



1 1 

T^+^Z 



Proof. First, iipij = 1, then Ajj = 0. Second, if k + l = 1, E [A^^Aj ^] = 0. So, we may restrict 
our attention to the case where k + l > 2 and pij < 1, which by dH implies pij = T/ mm{dr,dt). 
Claim 16.51 tells us that for k > 1, 



E 



A fc /\l 



<1e 
- ^ 



A fe-l A Z 



A similar statement may be made for / > 1. So, it suffices to prove the lemma in the case 
k + l = 2. 
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As Ar,t = {Ar^t — ^)/dr and A^^^ = (^r,t — ^)/dt, we have 

1 



E 



d^4 
1 



E 



(A 



r,t 



1 



^k+l 



Pr,t [ ^ „ ) + (1 - Pr,t; 



1 A - Pr,t 



(using k + I = 2) 



< 



d^d[ V Pr,t 

1 / I 



1 / min(dr, (it) 



V 

In the case k = 1, I = 1, we finish the proof by 

mm{dr,dt) 1 



1 

< — . 



and in the case A; = 2, / = by 



drdt m.ax{dr,dt) dr 



mm{dr,dt) ^ 1 
dl ~ dr 



□ 



This finishes the proofs of Lemmas 16.41 and 16.31 We now turn to the last ingredient we will 
need for the proof of Theorem 16. H a bound on the norm of the difference of the degree matrices. 

Lemma 6.7. Let G be a graph and let G he obtained by sampling G with probabilities pij that 
satisfy @ . Let D be the diagonal matrix of degrees of G, and let D be the diagonal matrix of 
weighed degrees of G. Then, 



Pr 



D-y\D - b)D-^l^ > el < 2ne-^^'/3 



Proof. Let di be the weighted degree of vertex i in G. As D and D are diagonal matrices, 

di 



max 



1 



di 



As the expectation of di is di and di is a sum of di random variables each of which is always or 
some value less than di/T, we may apply the variant of the Chernoff bound given in Theorem l6.8l 
to show that 



Pr 



di — di 



> edi 



< 2e-^^^l\ 



The lemma now follows by taking a union bound over i. 

We use the following variant of the Chernoff bound from |Rag88| . 



□ 
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Theorem 6.8 (Chernoff Bound). Let ai, . . . , all lie in [0, /?] and let Xi, . . . , X„ be indepen- 
dent random variables such that Xi equals ai with probability pi and with probability 1 — pi. 
Let X = ^ j Xi and ^ = E [X\ = ^ aiPi . Then, 



Pr [X > (1 + e)H < 



,(l + e)i+^ 

For e < 1, both of these probabilities are at most e~^"^^/3/3_ 



and Pr [X < (1 - e)^] < 



1 + e 



We remark that Raghavan |Rag88| proved this theorem with /3 = 1; the extension to general 
/? > follows by re-scaling. 

Proof of Theorem \6.1\ Let L be the Laplacian of G, A be its adjacency matrix, and D its 
diagonal matrix of degrees. Let L, A and D be the corresponding matrices for G. The matrices 
L and L only differ on rows and columns indexed by S. So, if we let L{S) denote the submatrix 
of L with rows and columns in S, we have 



D-^/\L - L)D-^/^ 
< 



D{S)-^/'^{L{S) - L{S))D{S) 



-1/2 



+ 



D{Sy^/^{A{S) - A{S))D{S)-^''^ 

Applying Lemma 16.31 to the first of these terms, while observing 

eA 



D{S)-^I^{D{S) - D{S))D{S)-^/^ 



2kn^/^ ^ Ak 

~ 7^ ^ 3 



we find 



Pr 



D{S)'^I'^{A{S) - A{S))D{S)- 



-1/2 



- 3 



<p/3. 



Applying Lemma 16.71 to the second term, we find 



Pr 



> 



eA 



D{S)-^''^{D{S) - D{S))D{S)~^''^ 
Thus, with probability at least 1 — 2p/3, 

D-^l\L-L)D-^l^ 



< 2ne 



-T{eA/3)73 



< 2ne 



<p/3. 



< 



2eA 



in which case Lemma 16.21 tells us that G is a a-approximation of G for 

A 



A - (2/3)eA 



< 1 + e, 



using e < 1/2. 

Finally, we use Theorem 16.81 to bound the number of edges in F. For each edge in F, 
let be the indicator random variable for the event that edge {i,j) is chosen to appear in 
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F. Using di to denote the degree of vertex i in G{S), we have 



E 



1 



min(dj, d 



^ ^ E ( 

i&S j:{i,j)eF ^ ' 

= T\S\. 

One may similarly show that E [X^-'^(ij)] > T l^l /2. Applying Theorem 16.81 with e = 1 (note 
that here e is the parameter in the statement of Theorem 16. Sp . we obtain 



e\-TC|-5|/2 



<'4 



ex -(81og2{3/p))2 



<p/3. 



□ 



7 Graph Decompositions 

In this section, we prove that every graph can be decomposed into components of high con- 
ductance, with a relatively small number of edges bridging the components. A similar result 
was obtained independently by Trevisan |Tre05j . We prove this result for three reasons: first, 
it enables us to quickly establish the existence of good spectral sparsifiers. Second, our algo- 
rithm for building sparsifiers requires a graph decomposition routine which is inspired by the 
computationally infeasible routine presented in this sectiorH. Finally, the analysis of our algo- 
rithm relies upon Lemma 17.21 which occupies most of this section. Throughout this section, we 
will consider an unweighted graph G = {V, E), with V = {1, . . . , n}. In the construction of a 
decomposition of G, we will be concerned with vertex-induced subgraphs of G. However, when 
measuring the conductance and volumes of vertices in these vertex-induced subgraphs, we will 
continue to measure the volume according to the degrees of vertices in the original graph. For 
clarity, we define the boundary of a vertex set S with respect to another vertex set B to be 

dB{S) = E{S,B-S), 

we define the conductance of a set S in the subgraph induced by S C y to be 

^G(a. def \EiS,B-S)\ 

' min(Vol(5),Vol(5-S))' 

and we define 

$g1^Um^.g(5). 



^The routine idealDecomp is infeasible because it requires the solution of an NP-hard problem in step 2. We 
could construct sparsifiers from a routine that approximately satisfies the guarantees of idealDecomp, such as the 
clustering algorithm of Kannan, Vempala and Vetta [K VV04) . However, their routine could take quadratic time, 
which is too slow for our purposes. 
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For convenience, we define (0) = 1 and, for \B\ = 1, = 1. 

We introduce the notation G{B} to denote the graph G{B) to which self-loops have been 
added so that every vertex in G{B} has the same degree as in G. For S Q B 

^G{B} (S) = «>g (5) . 

Because measures volume by degrees in G and those degrees are higher than in G{B), 

= ^G{B} < ^G(B)- 

So, when we prove lower bounds on <I>^, we obtain lower bounds on ^g{b)- 



7.1 Spectral Decomposition 

We define a decomposition of G to be a partition of V into sets {Ai, . . . , Afc), for some k. We 
say that a decomposition is a (f)- decomposition if > cj) for all i. We define the boundary of a 
decomposition, written d {Ai, . . . , A^) to be the set of edges between different vertex sets in the 
partition: 

d{Ai,...,Ak) = Enu^^jiAi xAj). 

We say that a decomposition (^i,...,^^) is a X-spectral decomposition if the smallest non- 
zero normalized Laplacian eigenvalue of G{Ai) is at least A, for all i. By Cheeger's inequality 
(Theorem 14. ID . every (^-decomposition is a (0^/2)-spectral decomposition. 



Theorem 7.1. Let G = {V,E) be a graph and let m = \E\. Then, G has a |^61og4/3 2m 
decomposition with |9 (Ai, . . . , ylfc)| < \E\ /2. 



7.2 Existence of spectral sparsifiers 

Before proving Theorem 17.1^ we first quickly explain how to use Theorem 17.11 to prove that 
spectral sparsifiers exist. Given any graph G, apply the theorem to find a decomposition of 
the graph into components of conductance 0(1/ log n), with at most half of the original edges 
bridging components. Because this decomposition is a 0(1/ log^ n)-spectral decomposition, by 
Theorem 16.11 we may sparsify the graph induced on each component by random sampling. The 
average degree in the sparsifier for each component will be O(log^n). It remains to sparsify 
the edges bridging components. If only O (n) edges bridge components, then we do not need 
to sparsify them further. If more edges bridge components, we sparsify them recursively. That 
is, we treat those edges as a graph in their own right, decompose that graph, sample the edges 
induced in its components, and so on. As each of these recursive steps reduces the number of 
edges remaining by at least a factor of two, at most a logarithmic number of recursive steps will 
be required, and thus the average degree of the sparsifier will be at most O(log^n). The above 
process also establishes the following decomposition theorem. 

Recently, Batson, Spielman and Srivastava |BSS09] have shown that (l+e)-spectral sparsifiers 
with 0(n/e^) edges exist. 
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7.3 The Proof of Theorem PTTl 

Theorem 17.11 is not algorithmic. It fohows quickly from the following lemma, which says that if 
the largest set with conductance less than (p is small, then the graph induced on the complement 
has conductance almost (p. This lemma is the key component in our proof of Theorem 17.11 and 
its analog for approximate sparsest cuts (Theorem 18. ip is the key to our algorithm. 

Lemma 7.2 (Sparsest Cuts as Certificates). Let G = {V^E) he a graph and let (/> < 1. Let 

S C y and let S C B be a set maximizing Vol (S) among those satisfying 

(C.l) Vol (5) < Vol(S) /2, and 
(C.2) 1>g(5)<</>. 

//Vol {S) = aVol {B) for a < 1/3, then 

f I — 3a 
*g-s>*(— 

Proof. Let 5 be a set of maximum size that satisfies (C.l) and (C.2), let 

1 — 3q 



1-a 



and assume by way of contradiction that ^%_g < 4>f3. Then, there exists a set R C B — S such 
that 

(R) < ^(3, and 



Vol (R) < ^Vol (B-S). 



Let T = RU S. We wiU prove 



and Vol (5) < min (Vol (T) , Vol {B — T)), contradicting the maximality of S. 
We begin by observing that 

\E{T, B-T)\ = \E{R US,B - {RUS))\ < \E{S, B - S)\ + \E{R, B - S - R))\ 

< (pYol (5) + (0/3) Vol (R) . (12) 

We divide the rest of our proof into two cases, depending on whether or not Vol (T) < 
Vol (B) /2. First, consider the case in which Vol (T) < Vol (B) /2. In this case, T provides a 
contradiction to the maximality of S, as Vol (S) < Vol (T) < Vol (B) /2, and 

\E{T, B-T)\<(t) (Vol {S) + Vol {R)) = (^Vol (T) , 

which implies 

iT)<<p. 

In the case Vol (T) > Vol (B) /2, we will prove that the set B — T contradicts the maximality 
of S. First, we show 

1 — a' 



Vol {B-T)> [ -—- Vol (B) , (13) 
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which imphes Vol (B — T) > Vol (S) because we assume a < 1/3. To prove (fT3]) . compute 

Vol (T) = Vol (5) + Vol (R) 

< Vol (5) + (1/2) (Vol (B) - Vol {S)) 
= (l/2)Vol {B) + (1/2) Vol (5) 

■'i^)vol(iJ). 

To upper bound the conductance of T, we compute 

\E{T, B-T)\< cp\o\ {S) + (</'/3)Vol (i?) (by ^) 

< (INo\ (5) + (0/3) (Vol (B) - Vol (5))/2 
= 4No\ (B) (q + /3(1 - a)/2) . 

So, 

|i^(T,i3-r)| _ |j;(r,i3-T)| ^ 0Vol(i3)(a + /3(l-a)/2) _^ 



min(Vol(r) ,Vol(S - T)) Vol (5 - T) " Vol (S) (1 - a)/2 
by our choice of /3. □ 

We will prove Theorem 17.11 by proving that the following procedure produces the required 
decomposition. 



Set (P= (^21og4/3Vol(F)^ 

Note that we initially call this algorithm with B = V. 

idealDecoinp(i?, (p) 

1. If <1>^ > (j), then return B. Otherwise, proceed. 

2. Let S be the subset of B maximizing Vol (5) satisfying (C.l) and (C.2). 

3. If Vol (S) < Vol {B) /4, return the decomposition {B — S, idealDecomp(S', </))), 

4. else, return the decomposition (idealDecomp(i3 — S, (p), idealDecomp(5', (j))). 



Proof of Theorem\7.1\ To see that the recursive procedure terminates, recall that we have de- 



fined <I>^ = 1 when \B\ = 1. 

Let (^1, . . . ,^fc) be the output of idealDecomp(y). Lemma 17.21 implies that > 0/3 for 
each i. 

To bound the number of edges in d {Ai, . . . ,Ak), note that the depth of the recursion is at 
most log4/3 Vol(y) that at most a (p fraction of the edges are added to d{Ai, . . . ,Ak) at 
each level of the recursion. So, 

\d{A,,...,Ak)\ < |ii;|(/>iog4/3Voi(y) < \E\/2. 

□ 
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8 Approximate Sparsest Cuts 



Unfortunately, it is NP-hard to compute sparsest cuts. So, we cannot directly apply Lemma 17.21 
in the design of our algorithm. Instead, we will apply a nearly- linear time algorithm, ApproxCut, 
that computes approximate sparsest cuts that satisfy an analog of Lemma 17.21 stated in The- 
orem 18. 1[ Whereas in Lemma 17.21 we proved that if the largest sparse cut is small then its 
complement has high conductance, here we prove that if the cut output by ApproxCut is small, 
then its complement is contained in a subgraph of high conductance. 

The algorithm ApproxCut works by repeatedly calling a routine for approximating sparsest 
cuts. Partition, from |ST08a) . On input a graph that contains a sparse cut, with high prob- 
ability the algorithm Partition either finds a large cut or a cut that has high overlap with 
the sparse cut. We have not been able to find a way to quickly use an algorithm satisfying 
such a guarantee to certify that the complement of a small cut has high conductance. Kannan, 
Vempala and Vetta |KVV04j showed that if we applied such an algorithm until it could not find 
any more cuts then we could obtain such a guarantee. However, such a procedure could require 
quadratic time, which it too slow for our purposes. 

Theorem 8.1 (ApproxCut). Let (f),p G (0, 1) and let G = iV^E) he a graph with m edges. Let 
D he the output o/ ApproxCut (G, </),7>). Then 



(A.l) Yol{D) < (23/25)Vol(y), 
(A. 2) If then $g {D) < <j), and 

(A. 3) With prohahility at least 1 — p, either 
(A. 3. a) \o\{D) > (l/29)Vol(y), or 

(A.3.h) there exists a set W D V — D for which > f2{4>), where 



The code for ApproxCut follows. It relies on a routine called Partition2 which in turn 
relies on a routine called Partition from [STOSaj . While one could easily combine the routines 
ApproxCut and Partition2, their separation simplifies our analysis. The algorithm Partition2 
is very simple: it just calls Partition repeatedly and collects the cuts it produces until they 
contain at least 1/5 of the volume of the graph or until it has made enough calls. The algorithm 
ApproxCut is similar: it calls Partition2 in the same way that Partition2 calls Partition. 



/2(0) 



dof C2(/)^ 

log^ m 



(14) 



for some absolute constant C2- 



Moreover, the expected running time o/ ApproxCut 
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D = ApproxCut(G, 4>,p)., where G is a graph, G (0, 1). 

(0) Set Vq = V and j = 0. 

(1) Set r = [log2(?n.)] and e = min(l/2r, 1/5). 

(2) While j <r and Vol {Vj) > (4/5) Vol {V), 

(a) Set j = j + 1. 

(b) Set Dj =Partition2(G{V,„i},(2/23)(/),p/2r,e) 

(c) Set Vj = Vj-i - Dj. 

(3) Set D = DiU-'-UDj. 

8.1 Partitioning in Nearly-Linear-Time 

D = Partition2(G, 9,p, e), where G is a graph, 6,p, G (0, 1) and e G (0, 1). 

(0) Set Wo = V and j = 0. Set r = [log2(l/e)]. 

(1) While j <r and Vol {Wj) > (4/5)Vol (F), 

(a) Set j =j + 1. 

(b) Set = Partition(G{Wj_i},6'/9,p/r) 

(c) Set Wj = Wj-i - Dj. 

(2) Set D = Di\J---UDj. 

The algorithm Partition from [ST08a] . satisfies the following theorem (see [STOSal Theorem 



Theorem 8.2 (Partition). Let D be the output o/Partition(G, r, p), where G is a graph and 
T,p£ (0,1). Then 

(P.l) \o\{D) < (7/8)Vol(y), 

(P. 2) If D^% then {D) < r, and 

(P. 3) For some absolute constant ci and 



3.2]) 



Mr) 




for every set S satisfying 



Vol {S) < Vol {V) /2 and $g (S) < /i(r) 



(15) 



with probability at least 1 — p either 
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(P. 3. a) Vol(D) > (l/4)Vol(y), or 
(P.S.b) Vol {SnD)> Vol (S) /2. 

Moreover, the expected running time 0/ Partition is O (r~'^m log m log(l/p)) . 

If either (P.3.a) or (P.S.b) occur for a set S satisfying (|15p . we say that Partition succeeds 
for S. Otherwise, we say that it fails. 

One can view condition {A.3) in Theorem 18.11 as reversing the quantifiers in condition (-P.3) 
in Theorem 18.21 Theorem 18.21 says that for every set S of low conductance there is a good 
probability that a substantial portion of S is removed. On the other hand, Theorem 18.11 says 
that with high probability all sets of low conductance will be removed. 

The algorithm Partition2 satisfies a guarantee similar to that of Partition, but it strength- 
ens condition (P.S.b). 

Lemma 8.3 (Partition2). Let D be the output 0/ Partition2(G, e), where G is a graph, 
e,pe (0, 1) and e € (0, 1). Then 

(Q.l) \o\{D) < (9/10)Vol(F), 
(Q.2) IfD^$ then $g {D) < 9, and 
(Q.3) For every set S satisfying 

Vol (5) < Vol (V) /2 and $g (S) < fi{0/9), (16) 
with probability at least 1 — p, either 
(Q.3. a) \o\{D) > (l/5)Vol(F), or 

(Q.3.b) Yol{SnD) > {l-d)Yol{S), where 6 = max {e,^G {S) /fi{0/9)). 

Moreover, the expected running time o/Partition2 is O (^~^m log'^ ?nlog(l/e) log(log(l/e)/p)) . 

If either (Q.3. a) or (Q.S.b) occur for a set S satisfying (jl6p . we say that Partition2 succeeds 
for S. Otherwise, we say that it fails. 

The proof of this lemma is routine, given Theorem 18.21 

Proof. Let j* be such that D = DiU- ■ -UDj* . To prove (Q.l), let i' = Vol {{Di U • • • U Dj*^i)) /Vol (V). 
As Vol(iyj-._i) > (4/5)Vol(y), u < 1/5. By (P.l), Vol(Dj.) < (7/8)Vol (iyj._i), so 

Yol{DiU---U Dj*) < Vol(y) (z^+(7/8)(l-z^)) < Vol (F) ((l/5) + (7/8)(4/5)) = (9/10)Vol (F) . 
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To establish (Q.2), we first compute 



\E{D,V-D)\=Y,\E{Di,V-D)\ 

i=l 

3* 

<Y,\E{Di,W,.i-Di)\ 
1=1 
j* 

< J](^/9) mill (Vol (A) , Vol {W^-l - Di)) (by (P.2) and line lb of Partition2) 

1=1 

< 5^(0/9) Vol (A) 

1=1 

= (0/9)Vol (D) . 

So, if Vol (D) < Vol {V) /2, then $g {D) < 0/9. On the other hand, we established above that 
Vol (D) < (9/10) Vol {V), from which it follows that 

Vol(y -D)> (1/10) Vol (V) > (l/10)(10/9)Vol (D) = (l/9)Vol (D) . 

So 

^ ^ ^ min (Vol (D) , Vol (V -£>))" Vol (D) " ' 
To prove (Q.3), let S be a set satisfying p^ . and let Sj = S Ci Wj. From Theorem 18.21 we 
know that with probability at least 1 — p/r, 

Vol(5i) < (l/2)Vol(5o). (17) 

We need to prove that with probability at least I — p, either Vol{Wj*) < (4/5)Vol(F) or 
Vol (Sj*) < SYol (S). If neither of these inequalities hold, then 

j* = r. Vol {Wr) > (4/5)Vol (V) , and Vol (Sr) > SYol (S) > eVol {S) , 

where we recall r = [log2(l/e)] • So, there must exist a j for which Vol (Sj+i) > (l/2)Vol (Sj). If 
Sj satisfied condition (jl6p in G{Vj} this would imply that Partition failed for 5*^. We already 
know this is unlikely for j = 0. To show it is unlikely for j > 1, we prove that Sj does satisfy 
condition (fT6]) in G{Vj}. Assuming (fT7|l . 

^G{W,} i^j) - <^w, i^j) - min (Vol (5,), Vol (1^,-5,)) - Vol (5,) - VolGS^ 

where the third equality follows from the assumption Vol(5i) < (l/2)Vol (5*0) < (l/4)Vol(y) 
and the last inequality follows from the definition 5 = max(e, $g (-S*) / fi{0/9)). So, Sj satisfies 
conditions (jl5p with r = 9/9, but Partition fails for Sj. As there are at most r sets Sj, this 
happens for one of them with probability at most r{p/r) = p. 

Finally, the bound on the expected running time of Partition2 is immediate from the bound 
on the running time of Partition. □ 
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8.2 Proof of Theorem ISA) 



The rest of this section is devoted to the proof of Theorem with all but one hne devoted 
to part (A. 3). Our goal is to prove the existence of a set of vertices W of high conductance 
that contains all the vertices not cut out by ApproxCut. We will construct this set W in stages. 
Recall that Vi = V — Di U • • • U is the set of vertices that are not removed by the first i cuts. 
In stage i, we will express Wj, a superset of V^, as a set of high conductance C/j_i plus some 
vertices in Vi . We will show that in each stage the volume of the vertices that are not in the set 
of high conductance shrinks by at least a factor of 2. 

We do this by letting Si be the biggest set of conductance at most ai in Wi, where ai 
is a factor (1 — 2e) smaller than the conductance of Ui-i. We then show that at least a 2e 
fraction of the volume of Si lies outside Ui-i and thus inside Vi. From Lemma 17.21 we know 
that Ui '= Wi — Si has high conductance. We will use Lemma 18.31 to show that at most an e 
fraction of Si appears in V^+i. So, the volume of Si that remains inside V^+i will be at most half 
the volume of Vi that is not in C/j_i. We then set Wi^i = UiU {Si Ci V^+i), and proceed with 
our induction. Eventually, we will arrive at an i for which either Wi has high conductance or 
enough volume has been removed from Vi. 



V,+i- 



U,-i 

The subsets of Wi. Not drawn to 
scale. 



The shaded portion is VKi+i. It 
equals Ui U Vi+i, and so can be 
viewed as the union of the set of ver- 
tices maintained by the algorithm 
with the high-conductance set we 
know exists. 



Formally, we set 

Wo = Vo = V and ao = e/i((/./104). 
We then construct sets Si, Ui and Wi by the following inductive procedure. 

L Set i = 0. 

2. While i < r and Wi is defined, 

a. If Wi contains a set Si such that 

Vol{S^)<{l/2)Vol{Wi) and ^^^{Si)<a^, 
set Si to be such a set of maximum size. 
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b. 



If Vol (Si) > (2/17) Vol (V), stop the procedure and leave PVj+i undefined. 

If there is no such set, set Si = 0, set Ui = Wi, stop the procedure and leave Wi+i 

undefined. 

Set Ui = Wi- Si. 



c. 



Set 0.= {l- ex.. 



d. Set Ui+i = (1 - 2e)9i. 

e. Set Wi+i = UiUiSinVi+i). 

f. Set i = i + 1. 

3. Set W = Wi where i is the last index for which Wi is defined. 

Note that there may be many choices for a set Si. Once a choice is made, it must be fixed 
for the rest of the procedure so that we can reason about it using Lemma 18. 3i 

We will prove that if some set Si has volume greater than (2/17) Vol (V), then with high 
probability ApproxCut will return a large cut D, and hence part (A. 3. a) is satisfied. Thus, we 
will be mainly concerned with the case in which this does not happen. In this case, we will 
prove that 9i is not too much less than do, and so the set Ui has high conductance. If the 
procedure stops because Si is empty, then Wi = Ui is the set of high conductance we seek. 
We will prove that for some i < r probably either Si is empty, Vol(S'j) > (2/17)Vol(y) or 



Proof. We prove this by induction on i. For i = 0, we know that Vi = Wi. As Wi = UiU Si and 
the algorithm ensures Vi+i C Vi, 



yol{Vi) < (16/17)Vol(y). 



Claim 8.4. For all i such that Wi+i is defined, 



Vi+i C Wi+i C Wi. 



Vi+iOViOW^ = UiUS^. 



Thus, 



V-+1 cuu {s^ n Vi+i) = Wi+i c u 5, = Wi. 



□ 



Claim 8.5. For all i such that Ui is defined 




Proof. Follows immediately from Lemma 17.21 and the definitions of Si and 9i. 



□ 



Lemma 8.6. // 



(a) Vol (Si) < (2/17)Vol(y), and 



(b) Yol{V,-i) > (16/17)Vol(y), then 
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then 

Vol {Si n {Si^i n Vi)) > 2eVol (Si) . 

Proof. This lemma follows easily from the definitions of the sets Si, Ui and Vi. As Vi-i C Wi-i 
and Vol [Ui-i) > (l/2)Vol (VFi-i), 

Vol {U^-l) > (l/2)Vol > (8/17)Vol (V) > 4Vol {Si) . 

So, we may apply Claim 18.51 to show 

\du,., {Si)\ > \du,_, {Si n Ui^i)\ > ^i_iVoi {Si n . 

On the other hand, 

|5c7,_, < \dw, {Si)\ < fx.Vol (5,) = (1 - 2e)0i_iVol (5^) . 
Combining these two inequalities yields 

0i_iVol {Si n C/i„i) < (1 - 2e)0i_iVol (Si) 

and 

Vol (5i n Ui-i) < (1 - 2e)Vol (5^) . 

As 

Si<zw, = Ui.i u n V-), 

we may conclude 

Vol {Si n n Vi)) > 2eVol (Si) . 

□ 

We now show that if at most an e fraction of each Si appears in T^+i, then the sets Si n Vi+i 
shrink to the point of vanishing. 

Lemma 8.7. // all defined Si and Vi satisfy 

(a) Vol (Si) < (2/17) Vol (F), 

(b) Yol{V) > (16/17)Vol(y), and 

(c) Vol {Si n V+i) <eVol {Si), 

then for all i > 1 for which Si is defined, 

Vol {Si n v+i) < (i/2)Voi (S,_i n vi) , 

and 

Vol (Si) < (l/2)Vol (Si_i) . 
In particular, the set Sr is empty if it is defined. 
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Proof. Lemma 18.61 tells us that 

eVol {S^) < (l/2)Vol {Si n {Si^i n V-)) < (l/2)Vol n Vi) . 
Combining this inequality with (c) yields 

Vol {Si n Vi+i) < (i/2)Voi (5,_i n Vt) . 

Similarly, we may conclude from Lemma 18.61 that 

eVol < (l/2)Vol (5, n Vi+i) , 

which when combined with (c) yields 

eVol(5i+i) < {l/2)eYol {Si), 

from which the second part of the lemma follows. 

For Si to be defined, we must have Vol (5o) < (2/17)Vol {V); so, 

VoUSr) < {l/2YVol{So) < (1/2) ri°g2Voi(y)/2l (2/17) Vol (y) < _^(2/17)Vol (F) < L 

Vol {V) 

We conclude that the set Sr must be empty if it is defined. 

This geometric shrinking of the volumes of the sets Si allows us to prove a lower bound 



Lemma 8.8. Under the conditions of Lemma 8.7 



1 4 ' 

log m 



for some absolute constant C2. 
Proof. We have 



j=0 

As i < r and e = min(l/5, l/2r), we have 

(1 - 2ey~^ > 1/e. 
To analyze the other product, we apply Lemma 18.71 to prove 

i 

^Vo^S-j) < 2Vol(5o), 

j=0 
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and so 



jTi. 3Vol(5',) \ 3Vol(S,) 

11 Vnl (WA - 2^ 



Yol{Wj)J- ^(16/17) Vol (l^) 
^ ^_ 2-3-17Vol(5o) 



> 1 



16 Vol {V) 
2 • 3 • 17 2 



Thus, 



16 17 
1 

4' 



4e 4e 4e(104)2 [log m] log m log m 

for some constant C2- □ 

To prove that condition (c) of Lemma 18.71 is probably satisfied, we will consider two cases. 
First, if Vol {Si n Vi) < eVol (Si) then (c) is trivially satisfied as Fj+i C Vi. On the other hand, 
if Vol {Si n Vi) > eVol {Si), then we will show that Si n Vi satisfies conditions (fT6l) in G{Vi}, and 
so with high probability the cut Dj+i made by Partition2 removes enough of Si. 

Lemma 8.9. // 

(a) Vol (Si) < (2/17) Vol (y), 

(b) Yol{V^ > (16/17)Vol(y), and 

(c) Vol {Si nVi) > eVol {Si), 
then 

'^Gm{SiriVi)<^fi{^/i04), 

where 5 = Vol {Si R Vi) /Vol {Si). If, in addition 

Vol {Si n Vi+i) < -Vol {Si n v) , 





then 

Proof. By Claim [8T0l 



Vol(5inyi+i) < eVol {Si[ 



\^vAS^nv)\<\^wAS^)\■ 

Set 6 = Vol (5i n Vi) /Vol (5i). Assumption (c) tells us that (5 > e. As Vol {S.^ < (l/2)Vol {V), 
<ft rQnT/^ l^v; ('g^n Vi)| |% {Si)\ l (o^^'^i e . f^o e «f/^/in^N 

^cm {s^ n y,) = -i^^^^s;^ < = ^^cw} (^^) ^ T = 7^ - = ^/^(^/^o^)- 

The last part of the lemma is trivial. □ 
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Claim 8.10. 

dv. (5, n Vi) c dw, (Si) . 

Proof. 

dv^ {Si n Vi) = EiSinv^,Vi-{SinVi)) c E{s^,Vi-iSinVi)) c E{Si,Wi-{SinWi)) = dw, {s^) . 

□ 

We now show that if Vo^Sj) > (2/17)Vol (y), then in the ith iteration Partition2 will 
probably remove a large portion of the graph. If Vol (Si Pi Vi) < (l/2)Vol (Vi) we will argue that 
Si n Vi satisfies condition (fT6|) in G{V}- Otherwise, will argue that Vi — SiCiVi does. 

Lemma 8.11. // 

(a) Vol{V) > (16/17)Vol(y), 

(b) Vol (Si) > (2/17)Vol(y), and 

(c) Voi{Sinv) < (i/2)Voi(y,), 

then 

^Gm{S^nV)<2eflWlOA). 
Moreover, i/ Vol {Si nVn A+i) > (1 - 2e)Vol {Si n V) then 

Vol (A+i) > (l/29)Vol {V) . 

Proof. We first lower-bound the volume of the intersection of 5^ with Vi by 

Vol {Si n V) > Vol - (Vol {V) - Vol {V)) > Vol (5i) - (1/17) Vol {V) > (l/2)Vol {Si) . 



We then apply Claim 18.101 to show 

{s^ n V) = vor(^HT^ ^ WmW) - ' - 

The last part of the lemma follows from Vol {Si D Vi) > (1/17) Vol {V) and e < 1/5. □ 
Lemma 8.12. // 

(a) Vol {V) > (16/17) Vol {V) and 

(b) \o\{Sir\Vi) > (i/2)Voi(yi), 

then 

<^Gm{S^nV)<2efl{(|)/W4). 
Moreover, i/ Vol {{V - {S^ n V)) n A+i) > (1 - e)Vol {{V - {Si n V))) then 

Vol (A+i) > (3/16)Vol {V) . 
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Proof. As Vol (Si) < (l/2)Vol {W^) < (l/2)Vol {V) and Vol (V - - SiH Vi) > Vol (Vi) - Vol {Si) > 
(15/34) Vol {V), 

Vol {Vi -S,n > (15/17) Vol {S,) . 

So, by Claim Eini 

^Oiv, (V, - (V, n S,)) = < (1V15)M < ,1,/15).„ < 2./,,^/104,. 

The last part now follows from 

Vol {Vi -SiH V) > (15/17) Vol {S^) > ~yol {V) > (5/16)Vol {V) 

and e< 1/5. □ 

Proof of Theorem \8.1\ The proofs of (A.l) and (A. 2) are similar to the proofs of (Q.l) and 
(Q.2). 

To prove (A. 3), we will assume that for each set Si that satisfies conditions ()16p in 
the call to Partition2 succeeds and that the same holds for all sets Vi — Si that satisfy 
conditions (jl6p in G{Vi}. As this assumption involves at most 2r sets, by Lemma 18.31 it holds 
with probability at least 1 — 2r{p/2r) = 1 — p. 

If there is an i for which Vol {V) < (16/17)Vol {V), then Vol {D) > (1/17)^ and condition 
{A.S.a) is satisfied. So, we assume that Vol(Vi) > (16/17)Vol {V) for the rest of the proof. 

Observe that the algorithm ApproxCut calls Partition2 with 

9 = (2/23)</), 

and that 

(/-/104 < 61/9. 

So, if Vol {Si n V) < Vol {V^ /2 and 

^Gm{s^)<fl{<P/m, 

then Si satisfies the conditions (fT6l) in G{V}- 

If there is an i for which Vol {Si) > (2/17)Vol {V), then by Lemmas EH] and [8l2] either SidV 
or — {Si n Vi) satisfies conditions (I16p in G{Vi} and the success of the call to Partition2 
implies 

Vol(L>) > (l/29)Vol(y). 

So, for the rest of the proof we may assume Vo^S'j) < (2/17) Vol {V). In this case we may 
show that 

\ol {Si n V+i) <eYol {Si) (18) 

as follows. If Vol(5inyi) < eVol(5i) then trivially holds. Otherwise, Lemma E^] tells 
us that Si satisfies conditions (jl6p in and that the success of the call to Partition2 

guarantees ([TS]) . 

We may now apply Lemma [8. 71 to show that Sr is empty if it is defined. So, there is an i for 
which Wi = Ui and by Claim [831 and Lemma 18.81 

a ^ 

log m 

asV - D = Vr <^VQWi, the set W = Wi satisfies (A.3.b). □ 
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9 Sparsifying Unweighted Graphs 



We now show how to use the algorithms ApproxCut and Sample to sparsify unweighted graphs. 
More precisely, we treat every edge in an unweighted graph as an edge of weight 1. The al- 
gorithm UnwtedSparsif y follows the outline described in Section 17.21 Its main subroutine 
PartitionAndSample calls ApproxCut to partition the graph. Whenever ApproxCut returns a 
small cut, we know that the complement is contained in a subgraph of large conductance. In this 
case, PartitionAndSample calls Sample to sparsify the large part. Whenever the cut returned 
by ApproxCut is large, PartitionAndSample recursively acts on the cut and its complement 
so that it eventually partitions and samples both. The output of PartitionAndSample is the 
result of running Sample on the graphs induced on the vertex sets of a decomposition of the 
original graph. The main routine UnwtedSparsif y calls PartitionAndSample and then acts 
recursively to sparsify the edges that go between the parts of the decomposition produced by 
PartitionAndSample. 



G = 


UnwtedSparsif y(G, e, p) 




1. 


If Vol {V) < c^e^^n\o^^{n/p), return G (where C3 is set 


in the proof of Lemma 19. ip . 


9 
z. 


Set (t>= {2 log29/28 Vol {V)^ , p = p/Qn log2 n, and e = 


e(ln2)2 
(l+21og29/28 ")(2 log") ■ 


3. 


Set (Gi, . . . , Gk) = PartitionAndSample(G, (j), e,p). 




4. 


Let Vi,...,Vk be the vertex sets of Gi, . . . , G^, respectively, and let Go be the graph 
with vertex set V and edge set 9 (Vi, . . . , Vfc). 


5. 


Set Go = UnwtedSparsif y(Go, e,p). 




6. 


Set G = Eto G^. 




(Gi, 


. . . , Gk) = PartitionAndSample(G = {V, E),(t), i,p) 




0. 


Set A = /2(0)V2, where /2 is defined in (fn]). 




1. 


Set D = ApproxCut (G, 




2. 


If D = 0, return Gi = Sample(G, e,p. A). 




3. 


Else, if Vol (D) < (1/29) Vol (V) 

a. Set Gi = Sample(G(y - D), e,p, A) 

b. Return (Gi, PartitionAndSample(G(-D), i?!), e,p)). 




4. 


Else, 






a. Set Hi, ... ,Hk= PartitionAndSample(G(V' - D), 
h. Set Ii,. . . ,Ij = PartitionAndSample(G(L'), e,p) 
c. Return (i^i, . . . , HkJi, . . . , Ij). 
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Lemma 9.1 (PartitionAndSample). Let G = {V^E) he a graph. Let Gi, . . . ,Gk be the output 
0/ PartitioiiAndSample(G, e,p). LetVi,...,Vk be the vertex sets of Gi, ... ,Gk, respectively, 
and let Gq be the graph with vertex set V and edge set d (Vi, . . . , V^). 
Then, 

(PS.l) \d{Vi,...,Vk)\ < \E\/2. 

With probability at least 1 — 3np, 
(PS. 2) the graph 

k 

Go + Y,Gi 

i=l 

is a (1 -)- e)-'^+^°S29/28 Voi(y) dppj^Qximation ofG, and 

(PS. 3) the total number of edges in Gi, . . . ,Gk is at most c^e~'^ \V\ log^''(n/p), for some absolute 
constant C3. 

Proof. We first observe that whenever the algorithm calls itself recursively, the volume of the 
graph in the recursive call is at most 28/29 of the volume of the input graph. So, the recursion 
depth of the algorithm is at most log29/28 Vol (^)- Property {PS.l) is a consequence of part 
(^.2) of Theorem 18.11 and this bound on the recursion depth. 
We will assume for the rest of the analysis that 

1. for every call to Sample in line 2, Gi is a (1 + e) approximation of G and the number of 
edges in Gi satisfies (S.2), 

2. for every call to Sample in line 3a, Gi + G{D) + d {D, V — D) is a {1 + e) approximation 
of G and the number of edges in Gi satisfies (S.2), and 

3. For every call to ApproxCut in line 1 for which the set D returned satisfies Vol (D) < 
(l/29)Vol {V), there exists a set W containing V — D for which > /2(0), where /2 was 
defined in (flil) . 

First observe that at most n calls are made to Sample and ApproxCut during the course of the 
algorithm. By Theorem 18. H the probability that assumption 3 fails is at most np. If assumption 
3 never fails, we may apply Theorem 16.11 to prove that assumptions 1 and 2 probably hold, as 
follows. Consider a subgraph G{V — D) on which Sample is called, using Z? = if Sample is 
called on line 2. Assumption 3 tells us that there is a setWDV-D for which > /2 ((/>). 
Theorem 14.11 tells us that the smallest non-zero normalized Laplacian eigenvalue of G{W) is at 
least A, where A is set in line 0. Treating G{W) as the input graph, and S = V — D, we may 
apply Theorem 16. II to show that assumptions 1 and 2 fail with probability at most p each. Thus, 
all three assumptions hold with probability at least 1 — 3np. 

Property {PS.3), and the existence of the constant C3, is a consequence of assumptions 1 
and 2. Using these assumptions, we will now establish {PS.2) by induction on the depth of the 
recursion. For a graph G on which PartitionAndSample is called, let d be the maximum depth 
of recursive calls of the algorithm on G, let Gi, . . . , Gk be output of PartitionAndSample on 
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G, and let Vi, . . . ,Vk be the vertex sets of Gi, . . . , Gfe, respectively. We will prove by induction 
on d that 



^ Gi + (9 . . . , Vk) is a (1 + e)'^+^-approximation of G. 



(19) 



We base our induction on the case in which the algorithm does not call itself, in which case 
it returns the output of Sample in line 2, and the assertion follows from assumption 1. 

Let D be the set of vertices returned by ApproxCut. If D 7^ 0, then d>l. We first consider 



the case in which Vol {D) < (l/29)Vol {V). In this case, let H = G{D), let Hi, . 
graphs returned by the recursive call to PartitionAndSample on H, and let Wi, . 
vertex sets oi Hi, . . . , H^- Let Hq be the graph on vertex set D with edges d (Wi, 
may assume by way of induction that 

k 

Ho + ^H, 



, Hk be the 
, Wk be the 
■ ,Wk). We 



i=l 



is a (1 + e)'^-approximation of H. We then have 
G = G{V - D) + H + d{V - D,D) 
^(1 + e) {Gi + H + d{V -D,D)), 



< (1 + e) ^Gi + (l + e)'^ (^5^^* + ^o^ +d{V-D,D) 

< {l + ef+^ (Gi + Y,Hi + HQ + d{V -D,D)\ 



by assumption 2, 
by induction. 



i=l 
k 



= (l + e)'^+i \ Gi + Y,Hi + d{V -D,Wi,...,Wk)j ■ 
One may similarly prove 

(l + e)'^+iG^ [Gl+Y,H^ + ^{V -D,Wi, 



Wk) , 



i=l 



establishing (|T9|) for G. 

We now consider the case in which Vol [D) > (1/29) Vol (V). In this case, let H = G{D) and 
/ = Giy — D). Let Wi, . . . , Wk be the vertex sets of i^i, . . . , Hk and let C/i, . . . , Uj be the vertex 
sets oi Ii, . . . Ij. By our inductive hypothesis, we may assume that d {Wi, . . . , Wj) + Yli=i 
a (1 + e)'^-approximation of H and that d {Ui, . . . , Uj) + Yli=i is a (1 + e)'^-approximation of 
/. These two assumptions immediately imply that 

k _ j _ 

diWi,.. .,Wj,Ui, ...,Uj) + Y,Hi + Y,h 

i=l i=l 

is a (1 + e)'^-approximation of G, establishing ([T9|) in the second case. 

As the recursion depth of this algorithm is bounded by log29/28 (^)' have established 
property {PS. 2). □ 
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Lemma 9.2 (UnwtedSparsif y). For e,p £ (0, 1/2) and an unweighted graph G with n vertices, 
let G he the output o/ UnwtedSparsif y(G, e,p). Then, 

(U.l) The edges of G are a subset of the edges of G; and 

with probability at least 1 — p, 

(U.2) G is a {1 + e)- approximation of G, and 

(U.3) G has at most C4e~^nlog'^"'^(n/p) edges, for some constant C4. 

Moreover, the expected running time of UnwtedSparsif y is O (mlog(l/p) log^^ n). 

Proof. From (P^.l), we know that the depth of the recursion of UnwtedSparsif y on G is at 
most log2Vol(y) < 21ogn. So, with probabiHty at least 

1 — (2 log n) • 2>np = 1 — p, 

properties {PS.2) and {PS.3) hold for the output of PartitionAndSample every time it is called 
by UnwtedSparsif y. For the rest of the proof, we assume that this is the case. 

Claim {U.3) follows immediately from {PS.3) and the bound on the recursion depth of 
UnwtedSparsif y. We prove claim {U.2) by induction on the recursion depth. In particular, we 
prove that if UnwtedSparsif y makes d recursive calls to itself on graph G, then the graph G 
returned is a (1 + e In 2/(2 log n + 1))'^ approximation of G. We base the induction in the case 
where UnwtedSparsif y makes no recursive calls to itself, in which case it returns at line 1 with 
a 1-approximation. 

For d> 0, we assume for induction that Go is a (1 + e In 2/2 log n)'^~ ^-approximation of Gq. 
By the assumption that {PS.2) holds, we know that Gq + X^^Li Gi is a 

(1 + e)(i+i°S29/28 < (1 + e In 2/(2 log n)) 

approximation of G, as eln2/(21ogn) < 1 (here, we apply the inequality {l + xln2/k)^ < l + x). 
By following the arithmetic in the proof of Lemma 19. H we may prove that Go + Ef=i Gi IS a 
(1 + e In 2/(2 log n))'^ approximation of G. 
To finish, we observe that 

(l + eln2/(21ogn))2i°g" < 1 + e, 

for e < 1. 

Claim {U.l) follows from the observation that the set of edges of the graph output by Sample 
is a subset of the set of edges of its input. 

To bound the expected running time of UnwtedSparsif y, observe that the bound on the re- 
cursion depth of PartitionAndSample implies that its expected running time is at most O(logn) 
times the expected running time of ApproxCut with (j) = 17(1/ log n), plus the time required to 
make the calls to sample, which is at most 0{m). 

Another multiplicative factor of O(logn) comes from the logarithmic number of times that 
UnwtedSparsif y can call itself during the recursion. □ 
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10 Sparsifying Weighted Graphs 



In this section, we show how to sparsify graphs whose edges have arbitrary weights. We begin by 
showing how to sparsify weighted graphs whose edge weights are integers in the range {!,..., U} . 
One may also think of this as sparsifying a multigraph. This first result will follow simply from 
the algorithm for sparsifying unweighted graphs, at a cost of a 0(log U) factor in the number of 
edges in the sparsifier. 

We then explain the obstacle to sparsifying arbitrarily weighted graphs and how we overcome 
it. We end the section by proving that it is possible to modify our construction of sparsifiers so 
that for every node the total blow-up in weight of the edges attached to it is bounded. 

10.1 Bounded Weights 

We recall that we treat an unweighted graph as a graph in which every edge has weight 1, 
and for clarity we often refer to such a graph as a weight-1 graph. Our algorithm for sparsifying 
graphs with weights in{l,...,C/ — 1} works by constructing log2 U weight-1 graphs Gi and then 
expressing G as a sum of 2^Gi. Each edge of G appears in the graphs Gi for which the ith bit 
of the binary expansion of the weight of the edge is 1. We sparsify the graphs Gi independently, 
and then sum the results. 



G = BoundedSparsif y(G, e,p), G = {V,E,w) has integral weights in [1,2"). 

1. Decompose G as 

u-l 

G = Y,2'Gi, 

where each Gi is a weight-1 graph. 

2. For each z, set Gi = UnwtedSparsif y(Gj, e,p/u). 

3. Return G = ^.2* Gi. 



Lemma 10.1 (BoundedSparsif y). For e,p G (0, 1/2) and a graph G with integral weights and 
with n vertices, let G he the output o/ BoundedSparsif y(G, e,p). Let U — 1 be the maximum 
weight of an edge in G. Then, 

(B.l) The edges of G are a subset of the edges of G; and, 
with probability at least 1 — p, 
(B.2) G is a {1 + e)- approximation of G, and 
(B.3) G has at most C4e~^nlog C/log'^"'^(n/p) edges. 

Moreover, the expected running time of BoundedSparsif y is O (mlog C/log(l/p) log^^ n). 
Proof. Immediate from Lemma l9.2[ □ 
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10.2 Coping with Arbitrary Weights: Graph Contraction 

When faced with an arbitrary weighted graph, we wih first approximate the weight of every 
edge by the sum of a few powers of two. However, if the weights are arbitrary many different 
powers of two could be required, and we could not construct a sparsifier by treating each power 
of two separately as we did in BoundedSparsif y. To get around this problem, we observe that 
when we are considering edges of a given weight, we can assume that all edges of much greater 
weight have been contracted. We formalize this idea in Lemma 110.21 

By exploiting this idea, we are able to sparsify arbitrary weighted graphs with at most a 
0(log(l/e))-factor more edges than employed in BoundedSparsif y when U = n. Our technique 
is inspired by how Benczur and Karger [BK96j built cut sparsifiers for weighted graphs out of 
cut sparsifiers for unweighted graphs. 

Given a weighted graph G = {V, E, w) and a partition Vi, . . . , of V, we define the map of 
the partition to be the function 

7T:V^{l,...,k} 

for which tt{u) = i ii u G Vi. We define the contraction of G under vr to be the weighted graph 
H = ({1, . . . , A;} , F, z), where F consists of edges of the form {7r{u),7r{v)) for (u, v) £ E, and 
where the weight of edge S F is 

z{hj)= w{u,v). 

{u,v):TT{u)=i,TT(v)=j 

We do not include self-loops in the contraction, so edges {u,v) £ E for which tt{u) = 7r(v) do 
not appear in the contraction. 

Given a weighted graph H = {{1, . . . ,k} , F, z), we say that G = {V, E, w) is a pullback of H 
under vr if 

1. H is the contraction of G under vr, and 

2. for every edge G F, E contains exactly one edge {u,v) for which 7r{u) = i and 
tt{v) = j. 

In the following lemma, we consider a graph in which each of the vertex sets Vi, . . . ,Vk are 
connected by edges of high weight while all the edges that go between these sets have low weight. 
We show that one can sparsify the low-weight edges by taking a pullback of an approximation 
of the contraction of the graph. 

Lemma 10.2 (Pullback). Let G = {V, E, w) be a weighted graph, let Vi, . . . ,Vk be a partition of 
V, and let vr be the map of the partition. Set Eq = d {Vi, . . . , Vk), Gq = {V, Eq, w), Ei = E — Eq, 
and Gi = (V, Ei,w). For some e < 1/2 let Gq be a pullback under vr of a {1 + e) -approximation 
of the contraction of Gq under vr. Assuming that c > 3, 

1. each set of vertices Vi is connected by edges in Ei, 

2. every edge in Ei has weight at least c^n^, and 

3. every edge in Eq has weight 1. 
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Then, Gq + Gi is an a -approximation of G, for 

Q = (l + e)(l + l/c)2. 

Our proof of Lemma 110.21 uses the following lemma bounding how well a path preconditions 
an edge. It is an example of a Poincare inequality [DS91j , and it may be derived from the 
Rank-One Support Lemma of |BH 03] . the Congestion-Dilation Lemma of BGH"'"06] . or the 
Path Lemma of |ST08b| . We include a proof for convenience. 

Lemma 10.3. Let (n, v) he an edge of weight 1, and let F consist of a path from u to v in which 
the edges on the path have weights wi, . . . ,Wk- Then, 

{u,v) 4 il/wi + --- + l/wk)F. 

Proof. Name the vertices on the path through k with vertex replacing u and vertex k 
replacing v. Let Wi denote the weight of edge {i,i — 1). We need to prove that for every vector 

X, 



^ 1 \ ^ 

=1 V i=i 



{x{k) - x{0)f <(>'-)>' wMi) - x{i -i)Y. 



For 1 <i <k set y{i) = y/wi{xi — The Cauchy-Schwarz inequality now tells us that 

{x{k) - x(0))^ = E V^ii^i - Xi-\)l\/w'i < E (^/V^)^ E (V^(^» ~ 



{,/Wi[Xi - Xi-l)''^ 

^i=l / \j=l / \j=l / 

as required. □ 



Proof of Lemma \10.SX Let H be the contraction of Gq under vr, and let H be the (1 -|- e)- 
approximation of H for which Gq is a pullback. 

We begin the proof by choosing an arbitrary vertex Vi in each set Vi. Now, let F be the 
weighted graph on vertex set {vi, . . . , Vk} isomorphic to H under the map i i— )• Vi, and let F be 
the analogous graph for H. Our analysis will go through an examination of the graphs 

l'^=F + Gi and L = F + Gi. 
The lemma is a consequence of the following three statements, which we will prove momentarily: 

(a) / is a (1 -|- l/c)-approximation of G. 

(b) / is a (1 -|- e)-approximation of /. 

(c) / is a (1 + l/c)-approximation of Gq + Gi. 

To prove claim (a), consider any edge (a, 6) G Eq. As 7r(a) ^ the graph -^Gi contains a 
path from a to ^^^(a) and a path from b to ^^^(b). The sum of the lengths of these paths is at most 
n, and each edge on each path has weight at least cn. So, if we let / denote an edge of weight 
1 from 7r(a) to '7r(6), then Lemma 110.31 tells us that 

(a, b) ^ (1/1 + n/cn) (/ + -\g^ = (1 + 1/c) (/ + -\g^ , (20) 
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and 



f4{l + l/c)[{a,b) + ^G,]. 



(21) 



As there are fewer than n?/2 edges in Eq, we may sum (j20p over all of them to estabhsh 



0„4{1 + l/c) 



So, 



Go + Gi ^ (1 + 1/c) F + -Gi 
I 2c 

4 (1 + 1/c) [F + Gi], 



as c > 1. The inequahty 



F + Gi ^ (1 + 1/c) [Go + Gi] 



and thus part (a), may be established by similarly summing over inequality pip . 

Part (6) is immediate from the facts that F is a (1 + e)-approximation of F, that I = F + Gi 
andI = F + Gi. 

Part (c) is very similar to part (a). We first note that the sum of the weights of edges in F is 
at most (1 + e) times the sum of the weights of edges in F, and so is at most (1 + e)n^/2. Now, 
for each edge (a, 6) in Go of weight w, there is a corresponding edge (^77(0)1 ^7r(b)) of weight w in 
F. Let e denote the edge (a, 6) of weight w and let / denote the edge (w7r(a)5 "^77(6)) of weight w. 
As in the proof of part (a), we have 



e ^ (1 + 1/c) / 



and 



/^(1 + 1/c) e + ^Gi 



w 



w 



Gi 



Summing these inequalities over all edges in £^0) adding Gi to each side, and recalling e < 1/2 
and c > 3, we establish part (c). □ 

We now state the algorithm Sparsify. For simplicity of exposition, we assume that the 
weights of edges in its input are all at most 1. However, this is not a restriction as one can scale 
down the weights of any graph to satisfy this requirement, apply Sparsify, and then scale back 
up. 

The algorithm Sparsify first replaces each weight We with its truncation to its few most 
significant bits, Zg. The resulting modified graph is called G. As is very close to We, little is 
lost by this substitution. As in BoundedSparsif y, G is represented as a sum of graphs 2~*G* 
where each G* is a weight-1 graph. Because the weight of every edge in G only has a few bits, 
each edge only appears in a few of the graphs G*. 

Our first instinct would be to sparsify each of the graphs G* individually. However, this could 
result in too many edges as sparsifying produces a graph whose number of edges is proportional 
to its number of vertices, and the sum over i of the number of vertices in each G* could be large. 
To get around this problem, we contract all edges of much higher weight before sparsifying. In 
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particular, the algorithm Spars if y partitions the vertices into components that are connected 
by edges of much higher weight. It then replaces each with a pullback of a sparsifier of the 
contraction of under this partition. In Lemma 111). 41 we prove that the sum over i of the 
number of vertices in the contraction of each will only be a small multiple of n. 

G = Sparsif y(G, e,p), where G = (y, E', and w{e) < 1 for all e G E. 

0. Set Q = [6/e] , b = 6/e, c = 6/e, e = e/6, and / = [logs 2bc^n^] ■ 

1. For each edge e £ E, 

a. choose re so that Q < 2^''We < 2Q, 

h. let Qe be the largest integer such that Qe^"^" < We, (and note Q < Qe < 2Q) 
c. set Ze = qe2~^''. 

2. Let G = {V, E, z), and express 

G = ^2-'G\ 

i>0 

where in each graph all edges have weight 1, and each edge appears in at most 
[log2 2Q] of these graphs. 

3. Let E' be the edge set of G\ Let E^' = Llj<iE^. For each i, let Df\ be the 
connected components of V under E-"^. For i = 0, set rji = 0. 

4. For each i for which E''' is non-empty, 

a. Let be the set of vertices attached to edges in E'' . 

b. Let CI, . . . ,Gl, be the sets of form Dj^~^ n that are non-empty and have an 
edge of E^ on their boundary, (that is, the interesting components of after 
contracting edges in E-'~'). Let = ^jGj. 

c. Let vr be the map of partition CI, . . . ,Cl.,, and let W be the contraction of 
{W\E^) under vr. 

d. W = BoundedSparsify(i?\e,p/(2n/)). 

e. Let G* be a pullback of under vr whose edges are a subset of E*. 

5. Return G = 5]. 2-^G\ 

Lemma 10.4. Let ki denote the number of clusters described by Sparsif y at step 4b- Then, 

^ ki < 2nl. 

i 

Proof. Let rji denote the number of connected components in the graph {y,E-'^). Each cluster 
Gj has at least one edge of E^ leaving it. As each pair of components under E-^~^ that are 
joined by an edge of E^ appear in the same component under E-*, 

m < Vi-i - h/2. 
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As the number of clusters never goes negative and is initially at most n, we may conclude 



^ fci < 2nl. 



□ 



Theorem 10.5 (Sparsify). For e S 1/3), p S (0, 1/2) and a weighted graph G and with 
n vertices in which every edge has weight at most 1. Let G be the output o/ Sparsif y(G, e,p). 

(X.l) The edges of G are a subset of the edges of G; and 

with probability at least 1 — p, 

(X.2) G is a {1 + e)- approximation ofG, and 

(X.3) G has at most c^e^^n\o^^{n/p) edges, for some constant C5. 

Moreover, the expected running time 0/ Sparsify is O (mlog(l/p) log^'' n) . 

Proof. To establish property (^-1), it suffices to show that step 4e can actually be implemented. 
That is, we need to know that all edges in H"^ can be pulled back to edges of E"^. This follows 
from (-B.1) and the fact that W is a contraction of E'^ . 

We now establish that the graph G is a (1 + l/Q)-approximation of G. We will then spend 
the rest of the proof establishing that G approximates G. As the weight of every edge in G is 
less than the corresponding weight in G, we have G ^ G. On the other hand, for every edge 
e & E, Wf, < {1 + 1/Q)ze, so G ^ (1 + 1/Q)G, and G is a (1 + l/Q)-approximation of G. 

Prom Lemma 110.41 we know that there are at most nl values of i for which ki > 2, and so 
BoundedSparsif y is called at most nl times. Thus, with probability at least 1 — p, the output 
returned by every call to BoundedSparsif y satisfies properties {B.2) and (-B.3), and accordingly 
we will assume that these properties are satisfied for the rest of the proof. 

As each edge set E^ has at most edges, the weight of every edge in graph W is an integer 
between 1 and n^. So, by property (-B.3), the number of edges in Hi , and therefore in Gi, is at 
most 



for some constant C5, thereby establishing {X.3). 

To establish (^.2), define for every i the weight-1 graph = {V,E-'^), and observe that 



C4r'^kilogn^log'^\ki/{p/{2nl))) < Cii-^kilog^^in^l/p). 



Applying Lemma 110.41 we may prove that the number of edges in G is at most 



^ar'^kilog^'^in'^l/p) < Cie-'^{2nl)\og^'^{nH/p) < cr^e~'^n\og^^{n/p), 



as e > 1/n, 




We may apply (-B.2) and Lemma 110.21 to show that 



G' + c^n^F'-^ 
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is a (1 + e)(l + -approximation of + cP'n^F^ K Summing over i while multiplying the ith 
term by 2~*, we conclude that 



2-' (G' + c^n^F'-^j = G + c^n^ ^ 2~'F'~^ = G + 2c^n^2-^G 



i>0 i>Q 
\2 



is a (1 + e)(l + 1/c) -approximation of 

2~' (g' + c^n^F*-') = G + c^n^ ^ 2"^^^"' = G + 2c^n^2-^G. 



i>0 i 

Setting 

/3 = 2cV2-' < 1/6, 

we have proved that G + (3G is a (1 -|- e)(l -|- 1/c) ^-approximation of {1 + (3) G, and by so 
Proposition 110.61 below. G is a 

(l + e)(l + l/c)2(l + /3) 

approximation of G. Property (^.2) now follows from the facts that G is a (l-|-l/Q)-approximation 
of G, and 

(1 + 6)(1 + 1/C)2(1 + m + l/Q) < (1 + < (1 + 6), 

for e < 1/2. 

To bound the expected running time of Sparsif y, we observe that the time of the computa- 
tion is dominated by the calls to BoundedSparsif y and the time required to actually form the 
graphs W. The sets Dj"^ may be maintained using union- find [Tar 75) . and so incur a cost of at 
most 0{n log n) over the course of the algorithm. Each graph may be formed by determining 
the component of each of its edges, at a cost of 0(|-E^*| logn). So, the time to form the graphs 
W can be bounded by 

I -E* I logn) = 0(m [log 2(5] logn) = 0(m log(l/e) log n). 

i 

This is dominated by our upper bound on the time required in the calls to BoundedSparsif y, 
which is 

O log'^ ls(l/p) log"*^^ = O [m log(l/e) log n lg(l/p) log^^ n) = O [m log(l/p) log^'^ n) . 

□ 

Proposition 10.6. If f3,j < 1/2 and G + /3G is a {1 + approximation of (1 + /3)G, then G 
is a {1 + (3){1 + 'y)- approximation of G. 

Proof. We have 

G + /3G^ (1 + 7)(1 + /3)G, 

which implies 

G^ (1 + 7)(1 + /3)G. 
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On the other hand, 



{1 + f3)G 4 (l + 7) [G + f3Gj imphes 

(1 - /37)G ^ (1 + 7)G, which implies 
1 + 7 ~ 
1-/37 

^(1 + /3)(1+7)G, 

under the conditions /?, 7 < 1/2. □ 
10.3 Bounding Blow-Up 

When we approximate a graph G = {V, E, w) by a graph G = (V, E, w) with E <^ E, we define 
the blow-up of an edge e G E hy 

blow-upg (e) 

Similarly, we define the blow-up of a vertex v to be 

1 



def I ^ if e e E, and 
1 otherwise 



blow-upg (f ) *== — blow-upg ((n, f )) . 



{u,v)£E 

The algorithm in jSTQSb] for solving linear equations requires sparsifiers in which every vertex 
has bounded blow-up. While the sparsifiers output by UnwtedSparsif y and BoundedSparsif y 
satisfy this condition with high probability, the sparsifiers output by Sparsify do not. The 
reason is that nodes of low degree can become part of clusters Cj with many edges of E^ on 
their boundary. These clusters can become vertices of high degree in the contraction by vr, and 
so can become attached to edges of high blow-up when they are sparsified. 

This problem may be solved by making two modifications to Sparsify. First, we sub-divide 
the clusters Cj so all the vertices in each cluster have approximately the same degree, and so 
that the degree of every vertex in is at most four times the degree of the vertices that map 
to it. Then, we set Gi to be a random pullback of Hi whose edges are a subset of E. That 
is, for each edge (c, d) £ Hi we pull it back to a randomly chosen edge (a, b) G E for which 
7r(a) = c and 7r(6) = d. In this way we may guarantee with high probability that no vertex has 
high blow-up. We now describe the corresponding algorithm Sparsif y2 by just listing the lines 
that differ from Sparsify. 
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G = Sparsif y2(G, e,p), where G = {V,E,w) has all edge-weights at most 1. 

4a. Let be the set of vertices in V with degrees in [2*^,2''"''^). Let be the set of 
vertices attached to edges in E\ Let V* be the set of vertices in V n F*. 

4b. For each 6, let ^Gl, . . . , ''C*^ be the sets of form Dj^^^ n ^V^ that are non-empty and 

have an edge of on their boundary. Let = Uj^^'^Cj. For each set "^Cj that 
has more than 2'^"'"^ edges of i?* on its boundary, sub-divide the set until each part 
has between 2^ and 2'^"'"^ edges on its boundary. [We will give a procedure to do the 
subdivision in the paragraph immediately after this algorithm]. Let ^G\, . . . ,^Clg be 

the resulting collection of sets. 

4c. Let vr be the map of partition of by the sets < > , and let be the contraction 
of {W\E^) under tt. 

4e. Let = BoundedSparsif y(i?*, e,p/{csnl log n)). Let be a random pullback of 
under tt whose edges are a subset of E. 



We should establish that it is possible to sub-divide the clusters as claimed in step 4b. To 
see this, recall that each vertex in a set ^G'j has de gree at most 2^+1. So, if we greedily pull off 
vertices one by one to form a new set, each time we move a vertex the boundary of the new set 
will increase by at most 2^^^ and the boundary of the old set will decrease by at most 2"^^^. 
Thus, at the point when the size of the boundary of the new set first exceeds 2^, the size of 
the boundary of the old set must be at least 2^"^^ — 1? — 2^^^ > 2^. So, one can perform the 
subdivision in step 4b by a naive greedy algorithm. 

Theorem 10.7 (Sparsify2). For e G (1/n, 1/3), p € (0,1/2) and a weighted graph G with n 
vertices, let G be the output o/ Sparsif y2(G, e,p). Then, 

(Y.l) the edges of G are a subset of the edges of G; and, 

with probability at least 1 — (4/3)p, 

(Y.2) G is a {1 + e)- approximation of G, and 

(Y.3) G has at most CQe~'^n\o^'^{n/p) edges, for some constant cq, 
(Y.4) every vertex has blow-up at most 2. 

Moreover, the expected running time o/Sparsify2 is O (m log(l/p) log^^ n) . 

Proof. To prove (5^.3), we must bound the number of clusters, ^ tf, produced in the modified 
step 4b. From Lemma 110.41 we know that 

Y,k!<2{l-n). (22) 

i 

To bound J^i^f^ l^t Oe^ (W) denote the set of edges in Ei leaving a set of vertices W. Let 
be the set of j for which ^Gj was created by subdivision, and recall that for all j G Sf, 



> 2^. 
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So, 



and 



> 2'{tf - kt), 



(23) 



As vertices in have at most 2*^+^ edges and each edge of G only appears in at most [log 2(5] 
sets i?*, 



Combining ([23]) with dM]) and ([22]), we get 



< riog2Q]2' 



<5+l 



(24) 



^tf <2riog2Ql 



V 



+ 2Zn, 



and so 



^ < 2 [log 2Q] n + 2Zn [log 2n] < csnZ log n. 



for some constant cs- By now applying the analysis from the proof of Theorem 110.51 we may 
prove that (1^.2) and iY.2>) hold with probability at least 1—p. Of course, property (1^.1) always 
holds. 

To prove property (YA), we note that the blow-up of a vertex v is the sum of l/d^ times the 
the blow-up of each of its edges. We prove in Lemma 110.81 that the expectation of this sum is 1, 
and in Lemma 110.91 that each term is bounded by 



48 log(3n/p)2" 



If the variables were independent, we could apply Theorem 16.81 to prove it is unlikely that v has 
blow-up greater than 2. 

However, the variables are not independent. The blow-up of edges output by BoundedSparsif y 
are independent. But, the choice of a random pullback at line 4e introduces correlations in the 
blow-up of edges. Fortunately, the blow-up of edges attached to v have a negative association 
(as may be proved by Proposition 8 and Lemma 9 of Dubhashi and Ranjan |DR98] ). Thus, by 
Proposition 7 of jDR98) . we may still apply Theorem 16. 8| with e = 1 and /i = 1 to show that 
the 

Pr [blow-upg (v) > 2] < e-481og{3n/p)V3_ 

Applying a union bound over the vertices v, we see that (1^.4) hold with probability at least 
l-p/3. 

The analysis of the running time of Spars if y2 is similar to the analysis of Spars if y, except 
for the work required to sub-divide sets in step 4b, which we now analyze. Each time a vertex is 
removed from a set ^Cj during the subdivision, the work required by a reasonable implementation 
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is proportional to the degree of that vertex in graph G*. So, the work required to perform all 
the subdivisions over the course of the algorithm is at most 



S 



As 



whenever we subdivide Gj, we have 



dEA^]>2 



Now, by (|2 



Thus, 



Y: Be. (^Gj) > 2^ 



5f 



Y2^\sl < [log 2Q] 2^+1 V < 2[log2Q]Vol (V 



5+1 



QO 



< 4 [log 2Q] Vol fV) = 0(m log(l/e)) 



The stated bound on the expected running time of Sparsify2 follows. 



□ 



Lemma 10.8. Let G = iV^E^w) he the graph output by Sparsify2 on input G = {V,E,w). 
Then, for every e £ E, 



Proof. We first observe that 



E [blow-upg (e)] < 1. 
E [blow-upg (e)] = 1. 



(25) 



(26) 



holds for the graph G output by Sample as it takes a weight-1 graph as input, selects a probability 
Pe for each edge, and includes it at weight 1/pe with probability pe- As UnwtedSparsif y merely 
partitions its input into edge-disjoint subgraphs and then applies Sample to some of them, (j26]) 
holds for the output of UnwtedSparsif y as well. 

To show that ()26p holds for the graph output by BoundedSparsif y for each edge e £ E and 
for each i set 

fl ifeGG* 



otherwise. 



We have 



We 



For the graph Gj returned on line 2 of BoundedSparsif y, let G' = {V,E^,w^). We have estab- 
lished that 

E [i 
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So, 



E [blow-upg (e)] = E 



2^E [wi] _ E^ 



Wp, Wp 



establishing (|26p for the output of BoundedSparsif y. 

Applying similar reasoning, we may establish (j25p for the output of Sparsif y2 by proving 
that for each edge e in each weight-1 graph G*, the expected blow-up of e in G* is at most 1. If e 
is not on the boundary of a set ^Cj, then e will not appear in and so its blow-up will be zero. 
If e = {u,v) is on the boundary, then let Wp denote the number of edges e' = {u',v') for which 
7r{u) = tt{u') and Tr{v) = 7r{v'). If we let H = {Y,F,y) and H = (Y,F,y), then Wp = y{'n-{u),w{v))- 

Now, let / be the edge (7r(u), 7r(v)) in H. We know that E [blow-up^^ (/)] =1. If / appears 
in H, then the probability that edge e is chosen in the random pullback is l/wp. As / has weight 
Wp, we find 

E [blow-upg, (e)] = — {wpB [blow-upjj, (/)]) = 1. 

Wp 

□ 

Lemma 10.9. Let G = {V,E,w) be the graph output by Sparsif y2 on input G = {V,E,w). 
Then, for every (n, v) G E, 

Ki t \^ min(d„,d„) 

blow-upp:; (n, f < - — ; — r-TTT- (27) 

^gV ' ' - 481og(3n/p)2 ^ ' 

Proof. As in the proof of the previous lemma, we work our way though the algorithms one-by- 
one. The graph produced by the algorithm Sample has blow-up at most min((iu, dy) /(16 log(3/p))2 
for every edge {u,v). As UnwtedSparsif y only calls Sample on subgraphs of its input graph, a 
similar guaranteed holds for the output of UnwtedSparsif y. In fact, as UnwtedSparsif y calls 
Sample with p < p/n, every edge output by UnwtedSparsif y actually has blow-up less than 

min((i„,(i^,)/(161og(3n/p))^. 

As BoundedSparsif y merely calls UnwtedSparsif y on a collection of graphs that sum to G, the 
same bound holds on the blow-up of the graph output by BoundedSparsif y. 

To bound the blow-up of edges in the graph output by Sparsif y2, note that for every i and 
every vertex a in a graph H^, the vertices v of the original graph that map to if* under tt satisfy 



where d„ refers to the degree of vertex v in the original graph and da is the degree of vertex a 
in graph H^. So, the blow-up of every edge {u,v) G E^ satisfies 

Amm{du,dv) min{du,dv) 



blow-upgi (n, v) < 



(161og(3n/p))2 481og(3n/p)2 



We now measure the blow-up of edges relative to G instead of G, which can only over-estimate 
their blow-up. The lemma then follows from 

^ 2-*blow-upg, (u, m.m{du,dy) sr^ 2'' m.m{du,dy 

blow-up (U,V] = > < ; -, — -TT > = ; ; 

^ ^ - 48log{3n/py ^ Zu,v 481og(3n/p)^ 



□ 
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11 Final Remarks 



Since the initial announcement |ST04 j of our results, significant improvements have been made 
in spectral sparsification. Spielman and Srivastava [SS08j have proved that spectral sparsifiers 
with 0(nlogn/e^) edges exist, and may be found in time O (mlog(nl^/e)) where W is the ratio 
of the largest weight to the smallest weight of an edge in the input graph. Their nearly-linear 
time algorithm relies upon the solution of a logarithmic number of linear systems in diagonally- 
dominant matrices. Until recently, the only nearly- linear time algorithm for solving such systems 
was the algorithm in [STOSbj . which relied upon the constructions in this paper. Recently, 
Koutis, Miller and Peng |KMP10] have developed a faster algorithm that does not rely on the 
sparsifier construction of the present paper. Their algorithm finds a-approximate solutions to 
Laplacian linear systems in time 0(m log^ n log a~^). One may remove the dependence on W 
in the running time of the algorithm of |SS08| through the procedure described in Section [10] of 
this paper. Batson, Spielman and Srivastava |BSS09j have shown that sparsifiers with 0{n/e'^) 
edges exist, and present a polynomial-time algorithm that finds these sparsifiers. It is our hope 
that sparsifiers with so few edges may also be found in nearly-linear time. 

Andersen, Chung and Lang [ACLOGj and Andersen and Peres [AP09j have improved upon 
some of the core algorithms we presented in |ST08aj and in particular have improved upon 
the algorithm Partition upon which we based ApproxCut. The algorithm of Andersen and 
Peres [AP09J is both significantly faster and saves a factor of log^ m in the conductance of the 
set it outputs. In particular, it satisfies guarantee (-P.3) with the term 0(r^/ log n) in place of 
our function /i(t). 
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